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1. Introduction 

In numerous applications, the relation of observable data Y and the (unknown) signal of 
interest can be modeled as an inverse linear regression problem. We shall assume that the 
data Y = {1^} is sampled on the equidistant grid X = {1, . . . , m}'^, with m, d G N and that 

& U for some linear space U, such as the Euclidean space or a Sobolev class of functions. 
Hence the model can be formalized as 

Y^ = {Ku^)^ + e^, X. (1) 

Here we assume that e = {ci/jiygx independent and identically distributed r.v. with 
E (e^) = and E {el) = cr^ > (white noise). Moreover, K : U ^ (M™)'^ denotes a linear 
operator that encodes the functional relation between the quantities that are accessible by 
experiment and the underlying signal. Often the operator K does not have a continuous 
inverse (or its inverse is ill-conditioned in a discrete setting, where K \s & matrix), that is 
estimation of given the data Y is an ill-posed problem. As a consequence, estimators for 
vP can not be obtained by merely applying the inverse of K to an estimator of Ku^, in 
general. Instead, more sophisticated statistical regularization techniques have to be employed 
that, loosely speaking, are capable of simultaneously inverting K and solving the regression 
problem. 

The application we primarily have in mind is the reconstruction of low-dimensional signals 
(e.g. images) which are presumed to exhibit a strong neighborhood structure as it is 
characteristic for imaging or signal detection problems. These neighborhood relations are 
often modeled by prior smoothness or structural assumptions on (e.g. on the texture of 
an image). 

The aim of this paper is twofold. First, we will introduce the broad class of statistical 
multiresolution estimators (SMRE). We claim that numerous regularization techniques, that 
were recently proposed for different problems in various branches of applied mathematics and 
statistics, can be considered as special cases of these. Among others, this includes the Dantzig 
selector (see [1, 0, [1^ and references therein) that was recently proposed in the context of 
high dimensional statistics. Our prior focus, however, will be put on imaging problems and 
it will turn out that the aforementioned neighborhood relations can be modeled within our 
SMRE framework in a straightforward manner. This will result in locally adaptive and fully 
automatic image reconstruction methods. 

The high intrinsic structure of the signals that are typically under consideration in imaging 
is in contrast to the usual situation in high-dimensional statistics. Here is usually assumed 
to be unstructured but to have a sparse representation with respect to some basis of U 
(cf. 0, S, 113] )• Consequently, the consistent estimation of is realized by minimizing 
a regularization functional which fosters sparsity, such as the £i-norm of the coefficients, 
subject to an ^oo-constraint on the coefficients of the residual, i.e. 

inf 11^x11, s.t. \\K*{Y-Ku)\\^<q. (2) 

In order to apply this approach for image reconstruction, two modifications become necessary: 
Often one aims to minimize other regularization functionals such as the total variation semi- 
norm (cf. (33. [37I]) or Sobolev norms, say. Hence, we suggest to replace the ^i-norm in ([2]) by 
a general convex functional J that models the smoothness or texture information of signals 
or images (cf. P, [IB])- Furthermore, we relax the ^00-constraint such that neighborhood 
relations of the image can be taken into account. This generalizes the Dantzig selector to 
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this task in a natural way and obviously increases estimation efficientcy. As we will layout in 
Paragraph II .2.21 this requires new algorithms to compute efficiently the resulting large scale 
optimization problem. 

1.1. Statistical Multiresolution Estimation. We will now introduce the announced class 
of estimators. To this end, let S be some index set and W = {w'^ : S E 5| be a set of given 

weight-functions on the grid X = {1, . . . , m}'^. A statistical multiresolution estimator (SMRE) 
(or generalized Dantzig selector), is defined as a solution of the constrained optimization 
problem 



inf Jiu) 

wee/ 



s.t. 



max 

5G5 



< 



(3) 



Here, J : U ^ M denotes a regularization functional that incorporates a priori knowledge on 
the unknown signal (such as smoothness) and A : (M™)'^ — )• (M™)'^ a possibly non-linear 
transformation. The constant q can be considered as a universal regularization parameter that 
governs the trade-off between regularity and data-fit of the reconstruction. In most practical 
situations q is chosen to be the a-quantile Qa of the multiresolution (MR) statistic T{e), where 
T : (R™)'^ — )• M encodes the inequality constraint in (l3|), i.e. 



T{v) 



max 

SG5 



V e 



t,m\d 



(4) 



To this end, we assume the distribution of T{e) to be (approximately) known. This can either 
be obtained by simulations or in some cases the limiting distribution can even be derived 
explicitly. The regularization parameter q then admits a sound statistical interpretation: 
Each solution of ^ satisfies 

P(j(n„) < J(u°)) > a 

i.e. the estimator Ua is more regular (in terms of J) than with a probability of at least 
a. To see this simply observe that the true signal satisfies the constraint in ([3]) with 
probability at least a. 

For a given estimator u of u^, the set W is assumed to be rich enough in order to catch 
all relevant non-random signals that are visible in the residual Y — Ku. Then, the average 
function 



l^s{v) 



E 

uGX 



(5) 



evaluated at v = Y — Ku is supposed to be significantly larger than q for at least one 
CO G W, whenever Y — Ku fails to resemble white noise. Put differently, the MR-statistic 
T{Y — Ku) is bounded by q, whenever Y — Ku is accepted as white noise according to the 
resolution provided by W. In fact, this is a key observations that reveals numerous potential 
application areas of the estimation method ([3]). The examples we have in mind are mainly 
from statistical signal detection and imaging, where the index set S is typically chosen to 
be an overlapping (redundant) system of subsets of the grid X and uj^ is the normalized 
indicator function on S" G 5. Consequently the inequality constraint in ([3]) guarantees that 
the residual resembles white noise on all sets S £ S. In other words, the SMRE approach in 
([3]) yields a reconstruction method that locally adapts the amount of regularization according 
to the underlying image features. We illustrate this in Section [3] by various examples. 
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Summarizing, the optimization problem in ([3|) amounts to choose the most parsimonious 
among ah estimators u for which the residual Y — Ku resembles white noise according to the 
statistic T. If y — Ku contains some non randon signal, i.e. T{Y — Ku) is likely to be larger 
than q and u happens to lie outside the admissible domain of ([3]). Thus, the multi-resolution 
constraint prevents too parsimonious reconstructions due to the minimization of J. 

1.2. Algorithmic Challenges and Related Work. 

1.2.1. Multiresolution Methods. SMREs and related MR statistics have recently been studied 
in various contexts. We give a brief (but incomplete) overview. 

Classical MR statistics are obtained from the general form in by setting U = (M™)"' 
and A = Id. Moreover, one considers the system W to contain indicator functions on cubes. 
To be more precise, define the index set S to be the system of all d-dimensional cubes in X 
and set u)^ = Xs/VWS- Then, the MR-statistic in (j3|) reduces to 



T{v) = max ■ 



This statistic was introduced in [44] (called scanning statistic there) in order to detect a signal 
against a noisy background. It was shown in [sH that 

iim — = a a.s. 

m.-^oo -^2dlog m 

If the system S is reduced to the set of all dyadic squares, then it was proved in [2^ that 
(after suitable transformations) T also converges weakly to the Gumbel distribution. There, 
the authors also established a method for locally adaptive image denoising employing linear 
diffusion equations with spatially varying diffusivity. SMREs ^ have been studied recently 
for the case d = 1 in (l^ and 0], where total- variation penalty and the number of jumps 
in piecewise constant regression were considered as regularization functional J, respectively. 



In 2l| consistency and convergence rates for SMREs have been studied in a general Hilbert 
space setting. 

SMREs with squared residuals, that is A.{v)i, = v'^, yield another class of estimators that 
have attracted much attention. Above all, the situation where S consists of the single set X 
and co^ is chosen to be the constant 1 function is of special interest, since then ([3]) reduces 
to the penalized least square estimation. In particular ([3]) then can be rewritten into 

MJ{u) + XY,{Ku-Y)l (6) 

for a suitable multiplier A > 0. If J{u) = \\u\\-^ the LASSO estimator will result (cf. [i^]). 
Recently, also non-trivial choices of S were considered. In 5 is chosen to consist of a 
partition of G which is obtained beforehand by a Mumford-Shah segmentation. In [l^, a 
subset S C X is fixed and afterwards S is defined as the collection of all translates of S. 



In 171] MR-statistics are used for shape-constrained estimation based on testing qualitative 
hypothesis in nonparametric regression for d = 1. Here, the weight functions uj'^ incorporate 
qualitative features such as monotonicity or concavity. Similarly, MR-statistics are used in 
jl8l | in order to detect locations of local increase and decrease in density estimation. Much 
in the same spirit is the work in [l^ where multiscale sign tests are employed for computing 
confidence bands for isotonic median curves. 
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As mentioned previously, the Dantzig- selector 0] is also covered by the general SMRE 
framework in ([3]). To see this, set U = W (with typically p ^ m), A = Id and define the 
weights 

= Kxs, SeS. 

Then, each solution of ([3]) can be considered as a generalized Dantzig selector. The matrix 
K G in this context is usually interpreted as design matrix of a high dimensional linear 

model. The classical Dantzig selector as introduced in 0] then results in the special case 
where S only consists of single-elemented subsets of {1, . . . ,p} and J is chosen to be the 
^i-regularization functional 

p 

J{u) = \\u\\^ = ^ \Ui\ . 

1=1 

Hence LASSO and Dantzig selector are uni-scale estimators which take into account the 
largest (S = {X}) and smallest (5 consists of all singletons in {1, . . . ,p}) scales, respectively. 
In this sense, they constitute two extreme cases of SMRE. 



1.2.2. Algorithmic Challenges. From a computational point of view, computing an SMRE 
amounts to solve the constrained optimization problem ([3]) which can be rewritten into 

inf J{u) s.t. ns{Y - Ku) < q, V(5 G S). (7) 

We note that in practical applications the number of constraints in ([7]) , that is the cardinality 
of the index set S, can be quite large (in Section [3.21 denoising of a 512 x 512 image results in 
more than 6 million inequalities). Moreover, the inequalities (even for the simplest case where 
A = Id) are mutually correlated. Both of these facts turn ([7]) into a numerically challenging 
problem and standard approaches (such as interior point or conjugate gradient methods) 
perform far from satisfactorily. 

The authors in 0, 15, 2^ approach the numerical solution of ([7]) by means of an analogon 



of dni) with spatially dependent multiplier A G (M™)'^, i.e. 

MJ{u)+Y,>^AKu-Y)l. 

Starting from a (constant) initial parameter A = Aq, the parameter A is iteratively adjusted by 
increasing it in regions which were poorly reconstructed before according to the MR-statistic 
T. This approach strongly depends on the special structure of S that allows a straightforward 
identification of each set S £ S with a unique point in the grid X. Put differently, it is not 
clear how to modify this paradigm in order to solve ([3]) for highly redundant systems S as we 
have it in mind. 

Recently a general algorithmic framework was introduced in [2] for the solutions of large- 
scale convex cone problems 

inf J(u) s.t. Ku-Y £lC 

where /C is a convex cone in some Euclidean space. The approach was realized in the software 
package Templates for First-Order Conic Solvers (TFOCS)^. The above formulation is very 



available at |http: //tf ocs . Stanford. edu/| 
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general and in order to recover ([7]) one has to consider the cone 



JC= {{v,q) e {W^y X 



<qy{SeS) 



The approach in 0] employs the dual formulation of the problem 

inf J*(K*v) + {Y,v) s.t. veJC* 

which involves the computation of the dual cone /C* (J* denotes the Legendre-Fenchel dual 
of J). This approach is particularly appealing for the uni-scale Dantzig selector since in this 
situation the cone /C coincides with the epi-graph of the -^°°-norm and hence its dual cone 
is straightforward to compute (it is the epi-graph of the £^-norm). As it is argued in [2], 
this approach is capable of computing Dantzig selectors for large scale problems in contrast 
to previous approaches such as standard linear programming techniques 0] or homotopy 



methods such as DASSO j29t | or [41[. As the authors stress, their approach works well in 
the case when /C is the epi-graph of a norm for which the projections onto IC* are tractable 
and computationally efficient. However, for the applications we have in mind (such as locally 
adaptive imaging r;construction), the approach in is only of limited use: In contrast to 
the aforementioned epi-graphs, the large number of (strongly dependent) constraints in ([7]) 
brings about a cone /C that on the one hand exhibits a tremendous amount of faces compared 
to the dimension of the image space dim{H) = md and that on the other hand is no longer 
symmetric w.r.t. to the (/-axis. Both of these facts turn the computation of dual cone fC* (or 
the projections onto it) into a most challenging problem, even in the simplest case when A is 
linear. 

The aim of this paper is to develop a general algorithmic framework that makes solutions 
of ([7]) numerically accessible for many applications. In order to do so we propose to introduce 
a slack variable in ([7]) and then use the alternating direction method of multipliers, an Uzawa- 
type algorithm that decomposes problem ([7]) into a J-penalized least squares problem for 
the primal variable and a orthogonal projection problem on the feasible set of ([7]) for the 
slack variable. This approach has the appealing effect that once an implementation for the 
projection problem is established, different regularization functionals J can easily be employed 
without changing the backbone of the algorithm. Our work is much in the same spirit as [33l |. 
which considered an alternating direction method for the computation of the Dantzig selector 
recently. In this case the computation of the occurring orthogonal projections are available 
in closed form, whereas in our applications this is not the case due to the aforementioned 
dependencies. 

In order to tackle the orthogonal projection problem we employ Dykstra's projection 
method 0] which is capable of computing the projection onto the intersection of convex 
bodies by merely using the individual projections onto the latter. The efficiency of the pro- 
posed method hence increases considerably if the index set S can be decomposed into "few" 
partitions that contain indices of mutually independent inequalities in ([7]). In particular, by 
this approach we will be able to compute classical SMRE (as introduced in l^, 21]) in d = 2 



space dimensions which to our knowledge has never been done so far. This puts us into the 
position to study the performance of such estimators compared with other benchmark meth- 
ods in locally adaptive signal recovery (such as adaptive weights smoothing cf. [i^]). As it 
will turn out in Section [3] it will outperform these visually as well as quantitatively. 
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1.3. Organization of the Paper. The paper is organized as follows: In Section [2] we in- 
troduce a general algorithmic approach for computing SMREs. We will rewrite ([7]) into a 
linearly constrained problem and compute a saddle point of the corresponding augmented 
Lagrangian by the alternating direction method of multipliers in Paragraph 12.21 Under quite 
general assumption, we prove convergence of the algorithm in Theorem 12.21 and give some 
qualitative estimates for the iterates in Theorem 12.41 One of the occurring minimization 
steps amounts to the computation of an orthogonal projection onto a convex set in Euclidean 
space. In Paragraph 12.31 this problem will be tackled by means of Dykstra's projection algo- 
rithm introduced in [5|. Finally, we illustrate the performance of some particular instances 
of SMREs in Section [3l we study problems in nonparametric regression, image denoising and 
deconvolution of fluorescence microscopy images and compare our results to other methods 
by means of simulations. 

2. Computational Methodology 

In this section we will address the question on how to solve the linearly constrained opti- 
mization problem ([7]). After discussing some notations and basic assumptions in Subsection 
12. 1^ we will reformulate the problem in Paragraph 12.21 such that the alternating direction 
method of multipliers (ADMM), a Uzawa-type algorithm, can be employed as a solution 
method. As an effect, the task of computing a solution of ([7]) is replaced by alternating 

i) solving an unconstrained penalized least squares problem that is independent of the MR- 
statistic T and 

ii) computing the orthogonal projection on a convex set in Euclidean space that is indepen- 
dent of J. 

This reveals an appealing modular nature of our approach: The regularization functional J 
can easily be replaced once a method for the projection problem is settled. For the latter we 
will propose an iterative projection algorithm in Paragraph 12.31 that was introduced by Boyle 
and Dykstra in 

2.1. Basic Assumptions and Notation. From now on, X will stand for the d-dimensional 
grid {1, . . . , ni}'^ and agree upon H = M'^ ~ (M™-)'^ being the space of all real valued functions 
z; : X — 7- M. Moreover, we assume that S denotes some index set and that W = jo;"^ : 5 € 5} 
is a collection of elements in H. For two elements v,w £ H we will use the standard inner 
product and norm 

{v,w) = Vj^Wi, and llfjl = ^y {v, v) 

respectively. Next, we assume that A : H ^ H is continuous such that A(0) = and that for 
all S S 5 the mapping 

is convex. With this notation, we can rewrite the average function in ^ in the compact form 

tisiv) = \{w^,Aiv))\ . 

We note, that it is not restricitve to consider more generaly A : H ^ with arbitrary 
N £ N. This could e.g. be useful for augmenting the constraint set of ([7]) with further 
constraints of different type. For the signal and image detection problems as studied in this 
paper, however, A is always a pointwise transformation of the residuals. Hence, we will restrict 
our considerations on the case when A : H ^ H. 
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Furthermore, we define C/ to be a separable Hilbert-space with inner product {■,-)u and 
induced norm \\-\\if. The operator K : U H is assumed to be linear and bounded and the 
functional J : [/ — t- M is convex and lower semi-continuous, that is 

{un}„aN ^ Un u £ U =^ J{u) < limiuf J(n„). 

Recall the definition of the MR-statistic in Throughout this paper we will agree upon 
the following 

Assumption A. i) For all y £ H there exists u £ U such that T{Ku — y) < q- 
ii) For all y £ H and c G M the set 



u £ U : max fis{Ku — y) + J{u) < c 
S GtS 



is hounded. 



Under Assumption [X] it follows from standard techniques in convex optimization, that a 
solution of d?]) exists. As we will discuss in Section [2.21 it even follows that a saddle point of 
the corresponding Lagrangian exists (cf. Theorem 12.11 below) . In this context Assumption El 
i) is often referred to as Slater's constraint qualification and is for instance satisfied if K{U) 
is dense in H. Moreover, Assumption |X] ii) will be needed in order to guarantee convergence 
of the algorithm for computing such a solution, as it is proposed in the upcoming section. 
This requirement is fulfilled if J is coercive i.e. 

lim J{u) = oo. 

|n|| jj— >oo 

In many applications U is some function space and J a gradient based regularization method, 
such as the total variation semi-norm (cf . Section 13. 2p . Then a typical sufficient condition 
for Assumption |A] ii) is that K does not annihilate constant functions. 

2.2. Alternating Direction Method of Multipliers. By introducing a slack variable v € 
H we rewrite ([7]) to the equivalent problem 

inf J{u) + G{v) subject to Ku + v = Y. (8) 
Here, G denotes the characteristic function on the feasible region C of ([7]), that is, 
C = {v£H : ^is{v) < g V(5 G S)} and G{v) = J ° v £ C 




Note that due to the assumptions on A, the set C is closed and convex. The technique of 
rewriting ([7|) into ([8|) is referred to as the decomposition- coordination approach, see e.g. Fortin 
&: Glowinski (2ol . Chap. 111]. There, Lagrangian multiplier methods are used for solving ([8]). 
To this end, we recall the definition of the augmented Lagrangian of Problem ([8]), that is 

Lx{u,v]p) = ^\\Ku + v-Y\\^ + J{u) + G{v)-{p,Ku + v-Y), A > 0. (10) 
2A 

The name stems from the fact that the ordinary Lagrangian 

L{u, v;p) = J{u) + G{v) - {p, Ku + V - Y) 
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is augmented by the quadratic penalty term (2A)~"'^ \\Ku + v — Y\\^ that fosters the fulfillment 
of the linear constraints in ([8]). The augmented Lagrangian method consists in computing a 
saddle point {u,v,p) of Lx, that is 

Lx(u,v;p) < Lx{u,v-p) < Lx{u,v;p), \/{{u,v,p) £ U x H x H) 

We note that each saddle point {u,v,p) of the augmented Lagrangian Lx is already a saddle 
point of L and vice versa and that in either case the pair (u, v) is a solution of ([8|) (and thus 
u is a desired solution of ([T|)). This follows e.g. from [20, Chap 3. Thm. 2.1]. Sufficient 
conditions for the existence of saddle points are usually harder to come up with. Assumption 
[Al summarizes a standard set of such conditions. 

Theorem 2.1. Assume that Assumption\^ holds. Then, there exists a saddle point {{i,v,p) 
ofLx. 

Proof. According to [H, Chap. Ill, Prop. 3.1 and Prop. 4.2] a saddle point of L exists, if 
there is an element uq € U such that G is continuous at Kuq — Y and that 

lim J(u) + G{Ku -Y) = oo. (11) 

||m||q->oo 

According to Assumption [A] i) and due to the continuity of A the first requirement is clearly 
satisfied. Further, the coercivity assumption (llip is a consequence of Assumption lAl ii) . □ 

We will use the Alternating Diretion Method of Multipliers (ADMM) (cf. Algorithm [1]) 
as proposed in [lO, Chap. Ill Sec. 3.2] for the computation of a saddle point of Lx (and 
hence of a solution of ([7])): Successive minimization of the augmented Lagrangian Lx w.r.t. 
the first and second variable followed by an explicit step for maximizing w.r.t. the third 
variable is performed. Convergence of this method is established in Theorem 12.21 which is 
a generalization of [ioi. Chap. Ill Thm. 4.1]. We note that the proof, as presented in the 
Appendix [A] allows for approximate solution of the individual subproblems. For the sake of 
simplicity, we present the Algorithm in its exact form. 

Theorem 2.2. Every sequence {("Ufc, ffc)}fc>i that is generated by Algorithm[l\ is bounded in 
U X H and every weak cluster point is a solution of ^ . Moreover, 

W^'^k + Vk- + \\K{Uk - Ufc_i)f < oo. 

fceN 

Remark 2.3. i) Theorem 1 2 . 2 1 implies . that each weak cluster point of {^ifc}fc>i is a solution 
of ([7]). In particular, if the solution of ([7|) is unique (e.g. if J is strictly convex), then 
Uk ■u"''. 

ii) Note in particular that ([12]) is independent of the choice of J, while ([T3]) is independent of 
the multiresolution statistic being used. This decomposition gives the proposed method 
a neat modular appeal: once an efficient solution method for the projection problem (jl2p 
is established (see e.g. Section [23]) . the regularization functional J in ([3]) can easily be 
replaced by providing an algorithm for the penalized least squares problem (|13|) . For most 
popular choices of J, problem ([T3]) is well studied and efficient computational methods 
are at hand (see for a extensive collection of algorithms and for an overview on 
MCMC methods). 

For a given tolerance r > 0, Theorem 12.21 implies that Algorithm [1] terminates and outputs 
approximate solution u\t\ and v\t\ of ([8]). However, the breaking condition in Algorithm [1] 



10 



STATISTICAL MULTIRESOLUTION DANTZIG ESTIMATION IN IMAGING 



Algorithm 1 Alternating Direction Method of Multipliers 
Require: Y £ H (data), A > (step size), r > (tolerance). 

Ensure: (u[T],t'[r]) is an approximate solution of® computed in /c[t] iteration steps. 
uq ^ 0(7 and vq = pQ Oh 
r <r- \\Kuq + fo - 5^11 and A; ^ 0. 
while r > T do 
k ^ k + 1. 

Vk ^ V where v £ C satisfies 

\\v-{Y + \pk-i-Kuk-i)\\' <\\v-{Y + \pk-i-Kuk-i)f y{v£C). (12) 



Uk u where u satisfies 

1 \\Ku -{Y + Xpk-i - Vk)f + \J{u) < \ \\Ku - (y + Apfc_i - Vk)f + AJ(n) ^(u G \J). 

(13) 



Pk ^ Pk-i - {Kuk + Vk- Y)/X. 
r ^ max(||Kufc + Vk - Y\\ , \\K{uk - Uk-i] 
end while 

u[t] Uk and v[t] ^ Vk and A:[r] ^ k. 



merely guarantees that the linear constraint in ([8]) is approximated sufficiently well. Moreover, 
we know from construction that v[t] G C, which implies G{v[t]) = 0. So, it remains to evaluate 
the validity of n[r]: 

Theorem 2.4. Let {u,v,p) G U x H x H be any saddle point of L\. Moreover, let t > and 
u[t] £ U be returend by Algorithm^ Then, 

f 411 KiiW -I- 2t\ 

< J{u\t\) - J{u) - {K*p, u[t] - n)^ < t f 6 ||p|| + " ^' j V(t > 0). 

The result in Theorem [23] shows how the accuracy of the approximate solution u[t] depends 
on r. Moreover, it reveals that choosing a small step size A in Algorithm [T] possibly yields a 
slow decay of the objective functional J. However, it follows from the definition of L\ in (jlOp 
that a small value for A fosters the linear constraint in ([8]). 

Corollary 2.5. Let the assumtions of Theorem 12.41 be satisfied. Moreover, assume that 
J is a quadratic functional, i.e. J{u) = ^ ||Lit||y, where V is a further Hilbert-space and 
L : U D D ^ V is a linear, densely-defined and closed operator. Then 

\\L{u[t]-u)\\ = 0{V^) V(r>0). 

Example 2.6 (Dantzig selector). As already mentioned in the introduction, SMRE (i.e. 
finding solutions of ([3])) reduces to the computation of Dantzig selectors for the particular 
setting d = 1, [/ = ]RP (with usually p ^ m) and 

J{u) = \\u\\^ . 

When applying Algorithm [1] the subproblem (|13p amounts to compute 
Uk G argmin^ \\Ku- {Y + Xpk^i - Ufc)]]^ + A ||n||^ . 
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This is the well known least absolute shrinkage and selection operator (LASSO) estimator 



-1 


i;GM™ : 








i<j<™ 



43( 1 . For the classical Dantzig selector, one chooses S = {1, . . . ,p} and defines for G 5 the 
weight io^ = Kx{s}- Hence, the subproblem (jl2p in this case consists in the orthonormal 
projection of 1^ = ^ + ^Pk~i — Kuk_i onto the set 

< q for 1 < S < p 

The implications of Theorem 12.41 in the present case are in general rather weak. If the saddle 
point u is known to be (S-sparse and when K restricted to the support of u is injective, then 
it can be shown that |n[r] — u\-^ = 0{t). 

We finally note that for this particular situation a slightly different decomposition than 
proposed in ()2.2p is favorable. To be more precise, define K = K^K and Y = K^Y and 
consider 

J{u) + G{v) subject to Ku — v = Y. 

where G is the characteristic function on the set {v £ H : \\v\\^ < q}. Algorithm [1] applied 
to this modified decomposition then results in the ADMM as introduced in [s^]- In this case 
the projection in step (jl2p has a closed from. 



2.3. The Projection Problem. Algorithm [T] resolves the constrained convex optimization 
problem ([7]) into a quadratic program ()12|) and an unconstrained optimization problem (llSp . 
The quadratic program (I12p in the k-th. step of Algorithm [1] can be written as a projection: 

inf \\v — Ifcll^ subject to fJ-si^) < Q V(s G S) (14) 
where Y/^. = Y + Xpk~i — Kuk~i- We reformulate the side conditions to 

v G C = Pi C5 where Cs = {v£H : fis{v) < q} ■ (15) 
ses 

The sets Cs are closed and convex and problem (jl4p thus amounts to compute the projection 
PcO^k) of Y/c onto the intersection C of closed and convex sets. According to this interpretation, 
we use Dykstra's projection algorithm as introduced in to solve (jl4p . This algorithm takes 
an element v £ H and convex sets Di, . . . , Dm C H as arguments. It then creates a sequence 
converging to the projection of v onto the intersection of the Dj by successively performing 
projections onto individual Dj's. To this end, let -Pd(') denote the projection onto D C H 
and Sd = Pd — Id be the corresponding projection step. Dykstra's method is summarized in 
Algorithm [2l 

A natural explanation of the algorithm in a primal-dual framework as well as a proof that 
the sequence {/i(Af, ^)}fcgp^ converges to Px>{h) in norm can be found in For the case 

when D constitutes a polyhedron even explicit error estimates are at hand (cf. (46l|): 

Theorem 2.7. Let {/ifc}^,gp^ be the sequence generated by Algorithm \M CLnd Pv{h) be the 
projection of the input h onto V. Then there exist constants p > and < c < 1 such that 
for all k £N 

\\hk-Pvm\ <Pc^ 

Remark 2.8. The constant c increases with the number M of convex sets which intersection 
form the set D that h is to be projected on. The convergence rate therefore improves with 
decreasing M. For further details and estimates for the constants p and c, we refer to [i^. 
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Algorithm 2 Dykstra's Algorithm 

Require: h £ H (data), Di, . . . , Dm C H (closed and convex sets) 
Ensure: A sequence {/ifcj^gp^ that converges strongly to Pv{h) where V = 0^=1 M^i 
ho,o ^ h 

for j = 1 to M do 

ho,j ^ Pojihoj^i) and Qoj ^ (/loj-i) 
end for 

hi /io,A/ and A; ^ 1 
for > 1 do 

for j = 1 to M do 

hk,j ^ PDj{hk,j-i — Qk-i,j) and Qk,j ^ So^ihkj-i — Qk~i,j) 
end for 

hk+i ^ hk,M and k k + 1 
end for 



Note that application of Dykstra's algorithm is particularly appealing if the projections 
can be easily computed or even stated explicitly, as it is the case in the following examples. 

Example 2.9. Assume that A = Id. Then the sets Cs are the rectangular cylinders 

Cs = {veH : \{u}'\v)\ < q} . 

The projection can therefore be explicitly computed as 

- sign {{u:^, v)) ^ - q) if ^,s{v) > q 



Pcsiv) 



else 



The left image in Figure [T] depicts an example for C for = M^. For a detailed geometric 
interpretation of the MR-statistic we also refer to [3^ . 

Example 2.10. Assume that A{v),^ = v"^. Then, it follows that v i— t- (^Lo^,A{v)'j is convex if 
and only if oj^ > for all u £ X. In this case, the sets Cs are elliptic cylinders 



Cs=lv€H : ^ u^y, < q I 



Moreover, if oj^ € {0, 1} for all u ^ X, then the projection Pcg can be explicitly computed as 



Pcsiv) 



^;:js;i(^«X{..s=i} + vx{ojs=oy if fis{v) > q 
V else 



The right image in Figure [T] depicts an example of C for H = 

A first approach to use Dykstra's algorithm to solve (|14p is to set M = and identify 



Dj with Cs for all j = 1, . . . ,M. In view of Remark 12.81 however, it is clearly desirable to 
decrease the number M of convex sets that enter Dykstra's algorithm. In order to do so, we 
take a slightly more sophisticated approach than the one just presented. We partition the set 
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S into Si, ... , Sm such that for all 5 / S" G Sj 

d 

u^±uj^ and — A(-)^ = 0,V(iy G 5,i> G 5) (16) 
and regroup {Cs}g^g into {Di, . . . , Dm} with 

Dj = n Cs. (17) 

ses, 

Given the projections Pc's^ the projection onto Dj can be easily computed: For v £ H identify 
the set 

Vj = {Se Sj : fis{v) > q} 
of indices for which v violates the side condition (llSp and set 

Pd,{v)=v- iPcs-ld)v 
S&Vj 

To keep M small, we choose Si C {1, . . . , N} as the biggest set such that (116p holds for 
all S,S ^ Si. We then choose ^2 <Z S\Si with the same property and continue in this way 
until all indices are utilized. While this procedure does not necessarily result into M being 
minimal with the desired property, it still yields a distinct reduction of N in many practical 
situations. We will illustrate this approach for SMREs in imaging in Section [3l 

3. Applications 

In this section we will illustrate the capability of Algorithm [1] for computing SMREs in 
some practical situations: in Secti on 13.11 we will study a simply one-dimensional regression 
problem as it was also studied in [1^, yet with a different penalty function J. In Section 
13.21 we illustrate how SMREs performs in image denoising. In both cases we compare our 
results to other methods. Finally, we will apply the SMRE technique to the problem of image 
deblurring in confocal fluorescence microscopy in Section 13.31 

Before we study the aforementioned examples, we clarify some common notation. We will 
henceforth assume that U = H = (W^)'^ with d = I (Section El]) and d = 2 (Sections [U and 
13. 3p . respectively. Moreover, we will employ gradient based regularization functionals of the 
form 

J{u) = TVp{u) := - XI l^^'^ll with p G {1, 2} (18) 
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where |-|2 is the Euchdean norm in and D denotes the forward difference operator defined 

by 

^ \ui,+ei -Uu \il<Vi<n-l 

(^""^^^ = |o else. 

For the case p = 2 the minimization problem (|13p amounts to solve an implicit time step 
of the d-dimensional diffusion equation with initial value {Y + \pk-i — v^) and time step size 
A. This can be solved by a simple (sparse) matrix inversion. 

For the case p = 1, TVi is better known as total-variation semi-norm. It was shown in 



[34( 1 (see also [2j] for similar results in the continuous setting) that the taut-string algorithm 
(as introduced in [13]) constitutes an efficient solution method for (fT3]l in the case d = 1. In 
the general case d > 1, we employ the fixed point approach for solving the Euler-Lagrange 
equations for (llSp described in [3] (see also (43 . Chap. 8.2.4]). We finally note that the 
functional TVi fails to be differentiable; a fact that leads to serious numerical problems 
when trying to compute the Euler-Lagrange conditions for (I13p . Hence, we will use in our 
simulations a regularized version of TVi defined by 



TVf{u) = ^^{Bu.)^ + P^ (19) 

for a small constant /? > 0. 

Evaluation. In order to evaluate the performance of SMREs, we will employ various distance 
measures between an estimator u and the true signal u^. On the one hand, we will use 
standard measures such as mean integrated squared error (MISE) and the mean integrated 
absolute error (MIAE) which are given by 




MISE = ^\ — ^{uu- ulf I and MIAE 




respectively. On the other hand, we also intend to measure how well an estimator u matches 
the "smoothness" of the true signal u*^, where smoothness is characterized by the regulariza- 
tion functional J. To this end, we introduce the symmetric Bregman divergence 

where VJ denotes the gradient of the regularization functional J. Clearly, DY"\u,u^) is 
symmetric and since J is assumed to be convex, also non-negative. However, the symmetric 
Bregman divergence usually does not satisfy the triangle inequality and hence in general does 
not define a (semi-) metric on [/ [^. The following examples shed some light on how the 
Bregman divergence incorporates the functional J in order to measures the distance of u and 

Example 3.1. Let J{u) = TVp as in (fT8]) . 

i) If p = 2 , then 

In other words, the symmetric Bregman distance w.r.t. to TV2 is the mean squared 
distance of the derivatives of u and u^. 
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ii) If p = 1, then 



m 



^ nr.- I , Ir. Oh ( . Dn^ Du° 

(|Dn,^| + \uu''\) ( 1 



^ (|Dn,| + |DuO|)(l-cos7.), 



m 

where denotes the angle between the level lines of u and at the point Xi,. Put 
differently, the symmetric Bregman divergence w.r.t the total variation semi-norm TVi 
is small if for sufficiently many points either both u and v9 are constant in a neigh- 
borhood of Xu or the level lines of u and v? through Xj^ are parallel. In practice rather 
TV^ in (|19|) (for a small /? > 0) instead of TVi is used in order to avoid singularities. 
Then, the above formulas are slightly more complicated. 

We will use the mean symmetric Bregman divergence (MSB) given by 

MSB = E(Z)7'"(n,n°)) 

as an additional evaluation method. In all our simulations we approximate the expectations 
above by the empirical means of 500 trials. 



Comparison with other methods. We will compare the SMREs to other regression methods. 
Firstly, we will consider estimators obtained by the global penalized least squares method: 

u{\) := argmin - > {ui, — Yf + XJ{u), A>0. (20) 

In particular, we focus on estimators u(A) that are closest (in some sense) to the true function 
. We call such estimators oracles. We define the L^- and Bregman-oracle by ii-ip. = ii{X2) 
and ub = ii{Xb), where 

A2 := E I argmin \\u^ — 'u(A)|| | and Ab := E | argmin L'j'™(u'^, u(A)) 
V A>o / V A>0 

respectively. Of course, oracles are not available in practice, since the true signal is 
unknown. However, they represent ideal instances within the class of estimators given by (j20p 
that usually perform better than any data-driven parameter choices (such as cross-validation) 
and hence may serve as a reference. 

Secondly, we also compare our approach to adaptive weights smoothing (AWS) jiol ] which 
constitutes a benchmark technique for data-driven, spatially adaptive regression. We compute 
these estimators by means of the official R-packagcl and denote them by u^^g, where ker £ 
{Gaussian, Triangle} decodes the shape of the underlying regression kernel. 



available aljhttp: //cran.r-project . org/web/packages/aws/index .html 
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3.1. Non-parametric Regression. In this section we apply the SMRE technique to a non- 
parametric regression problem in d = 1 dimensions, i.e. the noise model ([T]) becomes 

Y^ = ul + e,, iy = l,...,m, (21) 

where we assume that ejy are independently and normally distributed r.v. with E (eu) = 
and E (e^) = a^. The upper left image in Figure [2] depicts the true signal (solid line) and 
the data Y, with m = 1024 and a = 0.5. The application we have in mind with this example 
arises in NMR spectroscopy, where the NMR spectra provide structural information on the 
number and type of chemical entities in a molecule. In this context, we suggest to choose 
J = TV2, since the true signal is rather smooth (see [13] for examples where J is chosen 
to be the total variation of the first and second derivative) . 

Finally, we discuss the MR-statistic T in @. We choose A = Id and the index set S to 
consist of all discrete intervals with side lengths ranging from 1 to 100. For an interval S £ S 
we set uj^ = (#<S')~^/^X5- Thus, each SMRE solves the constrained optimization problem 



<q V(5 G S). (22) 



We choose q to be the a-quantile of the MR-statistic T, that is 



inf < Q G 




<q] >a} a G (0,1). (23) 



We note that except for few special cases (cf. jH, [sO]) closed form expressions for the dis- 
tribution of the MT-statistic T are usually not at hand. In practice one rather considers the 
empirical distribution of T where the variance can be estimated at a rate V md (cf. [ssl]). 

We will henceforth denote by a solution of ()22p with q = qa- As argued in Section [1.1) 
Ua is smoother (i.e. has smaller value TV2) than the true signal with a probability of at 
least a while it satisfies the constraint that the multiresolution statistic T does not exceed 
qa. This is a sound statistical interpretation of the regularization parameter a. 

Numerical results and simulations. In Figure [2] the oracles and ■ub, the AWS-estimators 
iia^s^"^'° and and u^.^^^^ as well as the SMRE uo.9, and no.5 are depicted. It is evident that 
the SMRE matches the smoothness of the true object much better than the other estimators 
while the essential features of the signal (such as peak location and peak height) are preserved. 
In particular, almost no additional local extrema are generated by our approach which stays 
in obvious contrast to the other methods. Moreover, we point out that the SMRE are quite 
robust w.r.t. the choice of the confidence level a. 

We verify this behavior by a simulation study in Table [TJ For different noise levels {a = 
0.1, 0.3 and 0, 5) we compare the MISE, MIAE and MSB. Additionally, we compute the mean 
number of local maxima (MLM) of u relative to the number of local maxima in (which 
is 11). Here u is any of the above estimators. Note that the latter measure (similar to the 
MSB) takes into account the smoothness of the estimators where a value MLM ^ 1 indicates 
too many local maxima and hence a lack of regularity whereas MLM < 1 implies severe 
oversmoothing. 

As it becomes apparent from Table [H the SMREs are performing similarly well when 
compared to the reference estimators as far as the standard measures MISE and MIAE are 
concerned. For small noise levels {a = 0.1) SMREs even prove to be superior. The distance 
measures MSB and MLM, however, are significantly smaller for SMREs which indicates that 
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Figure 2. Row-wise from top left: true signal (solid) with data Y, oracles 
u^2 and ub, AWS estimators u'^ws'^^^'^ and and the SMREs uo.g, no.75 

and uq.s- 

these meet the smoothness of the true object much better than the reference estimators 
(cf. Example 13.11 i)). All in all, the simulation results confirm our visual impressions above. 

Implementation Details. The current index set S results in an overall number of constraints 
in ([22]) of 

100 

#5 = J](1024 + = 97450. 

i=l 

As pointed out in Section 12.31 the efficiency of Dykstra's Algorithm can be increased by 
grouping independent side-conditions, that is side-conditions corresponding to intervals in S 
with empty intersection. For example, the system S can be grouped such that the intersection 
of the corresponding sets Di , . . . , Dm in (|17p form C with 



100 

M = ^i = 5050. 

i=l 
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a - 


= 0.1 








= 0.3 






MISE 


MSB 


MIAE 


MLM 


MISE 


MSB 


MIAE 


MLM 




0.009 


0.008 


0.071 


11.881 


0.046 


0.027 


0.156 


10.915 


U-B 


0.009 


0.007 


0.070 


11.700 


0.048 


0.026 


0.149 


10.359 


Triangle 
^aws 


0.007 


0.007 


0.048 


2.551 


0.040 


0.035 


0.112 


3.053 


^-J Gauss 
aws 


0.054 


0.040 


0.068 


1.971 


0.062 


0.041 


0.107 


2.230 


UO.9 


0.008 


0.004 


0.047 


1.336 


0.056 


0.019 


0.127 


1.273 


MO. 75 


0.007 


0.004 


0.044 


1.342 


0.050 


0.018 


0.121 


1.290 


Mo. 5 


0.007 


0.004 


0.043 


1.366 


0.046 


0.017 


0.116 


1.290 









= 0.5 






MISE 


MSB 


MIAE 


MLM 




0.091 


0.037 


0.213 


9.860 


Mb 


0.094 


0.036 


0.206 


9.135 


^ Triangle 
Maws 


0.078 


0.058 


0.162 


3.141 


Gauss 
aws 


0.079 


0.043 


0.149 


2.330 


M0.9 


0.134 


0.034 


0.207 


1.194 


M0.75 


0.120 


0.032 


0.196 


1.241 


M0.5 


0.109 


0.030 


0.186 


1.238 



Table 1. Simulation studies for one dimensional peak data set. 



In all our simulations we set r = 10~^ and A = 1.0 in Algorithm [T] which results in fc[r] ~ 100 
iterations and an overall computation time of approximately 20 minutes for each SMRE. We 
note, however, that more than 95% of the computation time is needed for the projection step 
(I12p and that a considerable speed up for the latter could be achieved by parallelization. 

3.2. Image denoising. In this section we apply the SMRE technique to the problem of 
image denoising, that is non-parametric regression in d = 2 dimensions. In other words, we 
consider the noise model (j2ip as in Section 13. 1|, where the index v ranges over the discrete 
square {1, . . . , m} . In Figure [3] two typical examples for images and noisy observations Y 
are depicted (m = 512 and g = 0.1, where is scaled between (black) and 1 (white)). 

We will use the total- variation semi-norm as regularization functional (/3 =10 ). 

Moreover, we choose A to be defined as 

A{v)^=vl, V(i/G l,...,m2). (24) 

The index set S is defined to be the collection of all discrete squares with side lengths ranging 
from 1 to 25 and we set uj^ = csXs with yet to be defined constants cg. Thus, each SMREs 
solves the constrained optimization problem 

inf rvf(n) s.t. y^cs{Y-u)l<q y{S € S). (25) 

We agree upon q = 1 and specify the constants cs- To this end, compute for s = 1, . . . , 25 
the quantile values 

qa,s = milq£R : P max ^ < g > 1 - Q > a G (0, 1) 

and set cg = Q.a\s- other words, the definition of cs implies that the true signal satisfies 
the constraints in (j25p for squares of a fixed side length s with probability at least a. We 
will henceforth denote by Ua a solution of ()25p . We remark on this particular choice of the 
parameters ujs below. 
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Numerical results and simulations. In Figures H] and [5] the oracles and ub, the AWS- 
estimators mIws"^^'° and u^^^^ as weU as the SMRE uo.g are depicted (for the "cameraman" 
and "roof" test image respectively). It is rather obvious that the L^-oracles are not favorable: 
although relevant details in the image are preserved, smooth parts (as e.g. the sky) still 
contain random structures. In contrast, the estimator 'u^^**** preserves smooth areas but looses 
essential details. The aws-estimator with triangular kernel performs much better, however, 
it gives piecewise constant reconstructions of smoothly varying portions of the image, which 
is clearly undesirable. The SMRE and the Bregman-oracle visually perform superior to the 
other methods. The good performance of the Bregman-oracle indicates that the symmetric 
Bregman distance is a good measure for comparing images. In contrast to the Bregman-oracle, 
the SMRE adapts the amount of smoothing to the underlying image structure: constant image 
areas are smoothed nicely (e.g. sky portions), while oscillating patterns (e.g. the grass part 
in the "cameraman" image or the roof tiles in the "roof" image) are recovered. 

We evaluate the performance of the SMREs by means of a simulation study. To this end, we 
compute the MISE, MIAE and MSB and compare these values with the reference estimators. 
We note, however, that in particular the MISE and MIAE are not well suited in order to 
measure the distance of images for they are inconsistent with human eye perception. In jisj ] 
the structural similarity index ( SSIM) was introduced for image quality assessment that takes 
into account luminance, contrast and structure of the images at the same time. We use the 
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Figure 4. Reconstructions "cameraman" (row-wise from top left): L^-oracle 
■Ul2, Bregman-oracle ub, AWS estimators tiJws'"^''' and m^<^'^'^^*°, and SMRE 

Uo.9- 

author's implementation which is normalized such that the SSIM lies in the interval [—1,1] 
and is 1 in case of a perfect match. We denote by MSSIM the empirical mean of the SSIM in 
our simulations. 

In Table [2] the simulation results are listed. A first striking fact is the good performance of 
the L^-oracle w.r.t. the MISE and MIAE which is supposed to imply reconstruction properties 



'available at |https : //www. ece .uwaterloo . ca/-z70wang/research/ssiin/ 
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superior to the other methods. Keepmg m mind the visual comparison in Figures U] and O 
however, this is rather questionable. On the other hand, it becomes evident that the L^-oracle 
has a rather poor performance w.r.t. the MSB which is more suited for measuring image 
distances. It is therefore remarkable that the SMRE performs equally good as the Bregman- 
oracle which, in contrast to the SMRE, is not accessible (since is usually unknown). As far 
as the structural similarity measure MSSIM is concerned our approach proves to be superior 
to all others. Finally, the simulation results indicate that aws estimation is not favourable for 
denoising of natural images. 
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"cameraman" 


"roof" 




MISE 


MSB 


MIAE 


MSSIM 


MISE 


MSB 


MIAE 


MSSIM 




0.0017 


0.0314 


0.0276 


0.7739 


0.0029 


0.0499 


0.0383 


0.6700 


Ub 


0.0023 


0.0256 


0.0275 


0.7995 


0.0038 


0.0405 


0.0391 


0.6607 


^ Triangle 
^aws 


0.0032 


0.0482 


0.0308 


0.7657 


0.0046 


0.0702 


0.0416 


0.6205 


Gauss 
^aws 


0.0046 


0.0470 


0.0360 


0.7284 


0.0053 


0.0686 


0.0457 


0.5668 


UO.9 


0.0021 


0.0252 


0.0297 


0.8024 


0.0033 


0.0374 


0.0407 


0.7003 



Table 2. Simulation studies for the test images "cameraman" and "roof". 



Notes on the choice of A and oj^ . In general, a proper choice of the transformation A and of 
the weight-functions oo'^ can be achieved by including prior structural information on the true 
image to be estimated. Substantial parts of natural images, such as photographs, consists 
of oscillating patterns (as e.g. fabric, wood, hair, grass etc.). This becomes obvious in the 
standard test images depicted in Figure [3l We claim that for signals that exhibit oscillating 
patterns, a quadratic transformation A as in (|24|) is favorable, since it yields (compared to 
the linear statistic studied in Section 13. ip a larger power of the local test statistic on small 
scales. 

In order to illustrate this, we simulate noisy observations Y of the test images u in Figure 
[3] as in (|2ip with o" = 0.1 and compute a global estimator u by computing a minimizer of the 
ROF-functional (I20p (with A = 0.1). We intend to examine how well over-smoothed regions 
in u are detected by the MR-statistic T{Y — u) as in ([4]) with two different average functions 

(cf. dSD) 



and 



/^2,s(t^) = Yli 



respectively. For the sake of simplicity we restrict for the moment our considerations on 
the index set 5 of all 5 x 5 sub-squares in {1, . . . , m}^. In Figure [6] the local means iXi^s of 
the residuals v = Y — u for the "roof" -image are depicted. To be more precise, the center 
coordinate of each square € 5 is colored according to ^i^s-, Hence, large values indicate 
locations where the estimator u is considered over-smoothed according to the statistic. It 
becomes visually clear that the localization of oversmoothed regions is better for ^2,5- This 
is a good motivation for incorporating the local means of the squared residuals in the SMRE 
model dl]). 




Figure 6. Local means ^i,s (left) and ^2,5 (right)) of the residuals for "roof" image. 
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We finally comment on the choice of cg. Since e,^ are independent and normally distributed 
random variables, the (scaled) average function 

.-v.(.) = E(^)' 

is distributed with #5 degrees of freedom. Note that the distribution of (t~'^iis{£) is 
identical only for sets S of the same scale #5". As a consequence of this, it is likely that 
certain scales dominate the supremum in the MR-statistic T which spoils the multiscale 
properties of our approach. As a way out, we compute normalizing constants for each scale 
separately. 

An alternative approach would be to search for transformations that turn Hsi^) into almost 
identically distributed random variables. Logarithmic and p-root transformations are often 
employed for this purpose (see e.g. [IH]). This will be investigated separately. 

Implementation Details. The current index set S results in an overall number of constraints 
in ([25]) of 

25 

#S = ^{512 - i + 1)2 = 6251300. 

1=1 

Again by grouping independent side-conditions, the system S can be grouped such that the 
intersection of the corresponding sets Di, . . . , Dm in (fT7|) form C with 

25 

M = ^i'^ = 5525. 

i=l 

In all our simulations we set r = 10~^ and A = 0.25 in Algorithm [1] which results in A;[r] w 30 
iterations and a overall computation time of approximately 2 hours for each SMRE. Hence, 
parallelization is clearly desirable in this case. 

3.3. Deconvolution. Another interesting class of problems which can be approached by 
means of SMREs are deconvolution problems. To be more precise, we assume that is a 
convolution operator, that is 



{Ku)i, = {k* u)i, = ^ 



rUr 



where fc is a square-summable kernel on the lattice 1/ and u ^ H is extended by zero-padding. 
We will focus on the situation where A; is a circular Gaussian kernel with standard deviation 
a given by 



1 Sti-? , , 

A:^ = ^ , e (26 

With Z = Y + Xpk-i + Vk, the primal step (fT3l) in Algorithm [1] amounts to solve 

Uk ^ argmin ^ {{Ku)i, - Zu)"^ + AJ(ii), 

where we choose J to be as in (jlSp and apply the techniques described in |44|] for the numerical 
solution. 

In order to illustrate the performance of our approach in practical applications, we give an 
example from confocal microscopy, nowadays a standard technique in fluorescence microscopy 
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(cf. [39|]). When recording images with this kind of microscope, the original object gets blurred 
by a Gaussian kernel (in first order). The observations (photon counts) can be modeled as a 
Poisson process, i.e. 

= Poiss((Ku°)^), ueX. (27) 



The image depicted in Figure 7(a) shows a recording of a PtK2 cell taken from the kidney of 



potorous tridactylus. Before the recording, the protein /3-tubulin was tagged with a fluorescent 



marker such that it can be traced by the microscope. The image in 7(a) shows an area of 
18 X 18 pm^ at a resolution of 798 x 798 pixel. The point spread function of the optical 
system can be modeled as a Gaussian kernel with full width at half maximum of 230nm, 
which corresponds to o" = 4.3422 in (I26p . 

Note that (I27p does not fall immediately into the range of models covered by ([T]) . We will 
adapt the present situation to the SMRE methodology described in Section [1] by standard- 
ization and consider instead of ([3]) the modified problem 

/Y - Ku\ 

inf J(n) s.t. Ti^=^)<l (28) 



where the division is understood pointwise. Clearly, the problem of finding a solution of 
(|28p is much more involved than solving ([3]) for the constraints being nonconvex: firstly, the 
functional G as defined in ([9]) is nonconvex as a consequence of which the convergence result 
in Theorem 12.21 does not apply and secondly Dykstra's projection algorithm as described in 
Section [2] cannot be employed. 

We propose the following ansatz in order to circumvent this problem: instead of projecting 
onto the intersection C of sets Cs as described in p^ . we now project in the fc-th step of 
Algorithm [1] onto 

N 

Cp[k] = fl Cp,s[k] where Cp,s[k] = {v e H : fig (t-//^) < q} ■ 

n=l 

with a pointwise division by the square root of Kuk- Put differently, in the A;-th step of 
Algorithm [1] we use the previous estimate of as a lagged standardization in order to 
approximate the constraints in ([28]) . In fact, we use y^max{Kuk, e) with a small number 
e > for standardization, in order to avoid instabilities. 

We note that while with this modification Dykstra's algorithm becomes applicable again, 
the projection problem (jl2p now changes in each iteration step of Algorithm [TJ As a conse- 
quence. Theorem 12.21 does not hold anymore after this modification, either. So far, we have 
not come up with a similar convergence analysis. 

We compute the SMRE UQ_g by employing Algorithm [1] with the modifications described 
above. As in the denoising examples in Section [32] the index set S consists of all squares with 
the side- lengths {1, . . . , 25} and we choose = xs and A = Id. We note, that this results 
in an overall number of = 95 436 200 inequality constraints. The constant q are chosen 
as in (|23p . where we assume that Si, are independent and standard normally distributed r.v. 

In Algorithm [1] we set A = 0.05 and compute 100 steps. We observe that after a few 
iterations (~ 15) the error r falls below 10~^ and almost stagnates thereafter, which is due 
to the fact that we do not increase the accuracy in the subroutines for (|13p and (|12p . Each 
iteration step in Algorithm [1] approximately takes 10 minutes, where 90% of the computation 
time is needed for (fT3l) . The result is depicted in Figure |7(b)[ 

The benefits of our method are twofold: 
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(a) Fluorescence microscopy data (b) SMRE uo.g'- fully automated 

of a PtK2 cell in potorous tridacty- and locally adaptive deconvolution 

lus kidney. The bright filaments of microscopy data, 
indicate the location of the protein 
/3-tubulin. 



Figure 7. Reconstruction of confocal microscopy data. 

i) The amount of regularization is chosen in a completely automatic way. The only param- 
eter to be selected is the level a in (|23p . Note that the parameter A in Algorithm [1] has 
no effect on the output (though it has an effect on the number of iterations needed and 
the numerical stability). 

ii) The reconstruction has an appealing locally adaptive behavior which in the present ex- 
ample mainly concerns the gaps between the protein filaments: whereas the marked 
/3-tubulin is concentrated in regions of basically one scale, the gaps in between actually 
make up the multiscale nature of the image. 




(a) STED microscopy recording of (b) Detail comparison between 
the PtK2 cell data set. confocal recording (left), SMRE 

Mo. 9 (middle) and STED recording 

(right). 



Figure 8. Comparison with high-resolution STED microscopy data. 
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In the present situation we are in the comfortable position to have a reference image at hand 
by means of which we can evaluate the result of our method: STED (STimulated Emission 
Depletion) microscopy constitutes a relatively new method, that is capable of recording images 
at a physically 5-10 times higher resolution as confocal microscopy (see (26l. \2^). Hence a 
STED image of this object may serve as "gold standard" reference image. 



Figure 8(a) depicts a STED recording of the PtK2 cell data set in Figure 7(a) The compari- 
son of the SMRE uo.g with the STED recording in Figure [8(b) [ shows that our SMRE technique 
chooses a reasonable amount of regularization: no artifacts due to under-regularization are 
generated and on the other hand almost all relevant geometrical features that are present in 
the high-resolution STED recording become visible in the reconstruction. In particular, we 
note that filament bifurcations (one such bifurcation is marked by a black box in Figure [8(b) [ ) 
become apparent in our reconstruction that are not visible in the recorded data. 

Finally, we mention that aside to standardization, other transformations of the Poisson 
data (j27p could possibly be considered. For example Anscombe's transformation is known 
to yield reasonable approximations to normality even for low Poisson-intensities and hence 
has a particular appeal for e.g. microscopy data with low photon-counts. We are currently 
investigating SMREs that employ Anscombe's transform, where in particular the arising 
projection problems are challenging. 



4. Conclusion and Outlook 

In this work, we propose a general estimation technique for nonparametric inverse regres- 
sion problems in the white noise model ([T]) based on the convex program ([7]). It amounts to 
finding a minimizer of a convex regularization functional J{u) over a set of feasible estimators 
that satisfy the fidelty condition T{Y — Ku) < q, where T is assumed to be the maximum 
over simple convex constraints and q is some quantile of the statistic T(e). Any such mini- 
mizer we call statistical multiresolution estimator (SMRE). This approach covers well known 
uni-scale techniques, such as the Dantzig selector, but with a vast field of potentially new 
application areas, such as locally adaptive imaging. The particular appeal of the multi-scale 
generalization arises for those situations where a "neighboring relationship" within the signal 
can be employed to gain additional information by "averaging" neighboring residuals. We 
demonstrate in various examples that this improvement is drastic. 

We approach the numerical solution of the problem by the ADMM (cf. Algorithm [1]) 
that decomposes the problem into two subproblems: A J-penalized least squares problem, 
independent of T, and an orthogonal projection problem onto the feasible set of d?]) that is 
independent of J. The first problem is well studied and for most typical choices of J fast 
and reliable numerical approaches are at hand. The projection problem, however, is computa- 
tional demanding, in particular for image denoising applications. We propose Dykstra's cyclic 
projection method for its approximate solution. Finally, by extensive numerical experiments, 
we illustrate the performance of our estimation scheme (in nonparametric regression, image 
denoising and deblurring problems) and the applicability of our algorithmic approach. 

Summarizing, this paper is meant to introduce a novel class of statistical estimators, to 
provide a general algorithmic approach for their numerical computation and to evaluate their 
performance by numerical simulations. The inherent questions on the asymptotic behaviour 
of these estimators (such as consistency, convergence rates or oracle inequalities) remain — to 
a large extent — unanswered. This opens an interesting area for future research. 



STATISTICAL MULTIRESOLUTION DANTZIG ESTIMATION IN IMAGING 



27 



A first attempt has been made in 2l[ where it is assumed that the model space U B is 
some Hilbert-space of real valued functions on some domain Q and that K : U ^ L^(f2) is 
linear and bounded. The error model ([T]) then has to be adapted accordingly. When y is a 
Gaussian process on L^(0) with mean Ku'^ and variance cr^ > 0, consistency and convergence 



rates for SMREs as cj —t- 0"*" have been proved in [2l|] for the case when A = Id. However, 
in order to extend these results to the present setting, one would rather work with a discrete 
sample of Kuq on the grid X and then consider the case when the number of observations 



= md tends to infinity. The previous analysis in [2]| indicates two major aspects that have 
to be considered in the asymptotic analysis for SMREs: 

(a) As A^ — 7- oo usually the cardinality of the index set S (and hence of the set of weight 
functions W) gets unbounded. Thus, the mutliresolution statistic T(e) = Ti\/{e) in ([4]) 
is likely to degenerate unless it is properly normalized and W satisfies some entropy 
condition. In the linear case (A = Id) we utilized a result from that guarantees a.s. 
boundedness of T]\[{e). 

(b) In order to derive convergence rates (or risk bounds) it is well known that the true 
signal has to satisfy some apriori regularity conditions. When using general convex 
regularization functionals J, this is usually expressed by the source condition 

K*p^ G 5J(u°), for some p° G L'^{n). 

Here K* denotes the adjoint of K and dJ the (generalized) derivative of J. For example, 

if J{u) = ^ ||u|| , then this conditions means that € van{K*). 
It would be of great interest to transfer and extend the results in [2l[ to the present situation. 
It is to be expected that (a) and (a) above are necessary assumptions for this purpose. 

As stressed by the referees, other extensions are of interest and will be postponed to future 
work. In contrast to imaging, in many other applications the design X is random, rather 
than fixed. In these situations an obvious way to extend our algorithmic framework would 
be to select suitable partitions S according to the design density, i.e. with finer resolution at 
locations with a high concentration of design points. It also remains an open issue how to 
extend the SMRE methodology to density estimation rather than regression, in particular in 
a deconvolution setup. For d = 1 a first step in this direction has been taken in and it 
will be of great interest to explore whether our approach allow this to be extended to d > 2. 
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Appendix A. Proofs 

In this section we shall give the proofs of Theorems 12.21 and 12.41 as well as Corollary 12.51 We 
note, that convergence of Algorithm [T] is a classical subject in optimization theory and a proof 
can e.g. be found in [13, Chap. Ill Thm. 4.1]. However, in order to apply these results, it is 
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necessary that certain regularity conditions for J hold, that are not realistic for our purposes 
(as e.g. in the case of total- variation regularization) . The assertions of Theorems 12.21 and 12.41 
are modifications of the standard results. 

Moreover, we will allow for approximate solution of the subproblems ()12p and (jl3p . To this 
end, we rewrite these two subproblems as variational inequalities, i.e. given {uk-i,Vk~i,Pk~i) 
we find {uk,Vk,Pk) such that 

G{v) - G{vk) + {Kuk-i + Vk-Y- Xpk-uv - Vk) > -efe, e H (29a) 
J{u) - J{uk) + {Kuk + Vk-Y- Xpk-i,Ku - Kuk) > -5k, £ U (29b) 
Pk = Pk-i - {Kuk + Vk- Y)/\, (29c) 

where we assume that {ei, £2, ■ ■ ■} and {5i, 82-, ■ ■ •} are given sequences of positive numbers. 
Note that (I29ap implies that G{vk) = and hence Vk G C and that for Sk = 5fc = (I29al) and 
(I29bp are equivalent to (fT2]) and (fT3]l . respectively. 

Finally, we remind the reader of the definition of the suhdifferential (or generalized deriv- 
ative) dF of a convex function F : y — t- M on a real Hilbert-space V: 

C£dF{v) ^ F{w)>F{v) + {S,,w-v)yy{weV). 

If ^ G dF{v), then ^ is called subgradient of F at v. It follows from (l9l, Chap III, Prop. 3.1 
and Prop. 4.1] that the Lagrangian L (and hence also the augmented Lagrangian L\) has a 
saddle-point {u,v,p) £U x H x H ii and only if 

Ku + v = Y, K*pedJ{u) and pedG{v). (30) 

We will henceforth assume that {{uk-,Vk,Pk)}k&'H ^ sequence generated by iteratively re- 
peating the steps (|29ap - ()29cp . Further, we introduce the notation 

Uk •■= Uk - u, Vk := Vk-v and pk := Pk - P- 

We start with the following 

Lemma A.l. For all A; > 1 we have that 

{\\Pk-if + \\Kuk-if) - [WPkf + WKukf^ 

> A"2 (^\\Kuk + vkf + \\Kuk-i - Kukf^ - 2\~\5k + <5fc_i) - 6k - Ek (31) 

Proof. The assertion follows by repeating the steps (5.6)-(5.25) in the proof of j2ol . Chap. Ill 
Thm. 4.1] after replacing (5.9) and (5.10) by ()29ap and (I29bp respectively. □ 

We continue with the proof of Theorem 12.21 More precisely, we prove the following gener- 
alized version 

Proposition A. 2. Assume that the sums X^^^ 5fc oi'^^'d X^^^i £fc t^^e finite. Then, the sequence 
{Wk,Vk)}i^-^i is hounded inUxH and every weak cluster point is a solution of Moreover, 

\\Kuk + Vk - + \\K{uk - Uk-i)\f < 00. 

fceN 
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Proof. Let k > 1 and define D = X^^^ (^fc and E = YlT=i^k- Summing up Inequality ([3T]) 
over k and keeping in mind that Ku^ + Vk = Ku^ + Vk — Y and Kuk-i — Kuk = Kuk-i — Kuk 
shows 

oo 

^ \\Kuk + Vk- Yf + \\Kuk-i - Kukf 

k=l 

< Wpf + \\Kuf + (4A~2 + 1)D + E <oo 

where we have used that uq = u and Po = p- Furthermore, it follows again from (j3ip that 

\\pkf + ||Kuf < \\pf + A-2 lli^uf + (4A-2 + 1)Z) + ^ < oo (32) 

This together with the fact that H-ftTufc + — y|| — )• shows that 

max(||i^Mfc|| , \\vk\\ , llpfcll) = 0(1). 

Together with the optimality condition for (|13p this in turn implies that for an arbitrary 
ue H 

J{uk) < J{u) + A-i {Kuk + Vk-Y- Xpk-i,Ku - Kuk) + 5k = 0{l). (33) 
Summarizing, we find that 

maxusiKuk — Y) + J{uk) < max \\A{Kuk — y)|| + J{uk) < c < oo 

for a suitably chosen constant c G R, since A is supposed to be continuous. Thus, it follows 
from Assumption lAl that {n^l^gp^ is bounded and hence sequentially weakly compact. Now, 
let {u,v,p) be a weak cluster point of {{uk,Vk,Pk)}keN recall that {u,v,p) was assumed 
to be a saddle point of the augmented Lagrangian L\. Setting u = u in (I33p thus results in 



J{uk) < J{u) + A ^ {Kuk + Vk -Y,Ku- Kuk) + {pk~i,Kuk - Ku) + 5k ^^^^ 
= J{u) + {pk~i,Kuk - Ku) + o(l) 
Using the relation Kii + v = Y we further find 

{pk-i,Kuk - Ku) = (pfc„i, Kuk - Y + v) 

= {Pk~i, Kuk +Vk-Y) - {pk-i,Vk -v) = o(l) - {pk~i,Vk - v) (35) 
From the definition of Vk in ([29a]) and from the fact that v,Vk G C it follows that 

{Y + Xpk-i - {Kuk-i +Vk),v- Vk) < £k 
which in turn implies that 

- {pk-i,Vk -v) < A"^ {Y - {Kuk-i + Vk),Vk -v) +ek 

= A~^ (y - {Kuk + Vk),Vk -v) + A"^ {Kuk - Kuk-i,Vk - v) + Sk = o(l) (36) 
Combining ([3l|) . ([35]) and ([36]) gives 

limsup J(nfc) < J(n). 

k—^oo 

Now, choose a subsequence {^^p(fe)}^gpj such that ttp(fc) ^ u. Since J is convex and lower semi- 
continuous it is also weakly lower semi-continuous and hence the previous estimate yields 

J(n) < liminf J(ii„n--)) < J{u). 
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Moreover, we have that Vp[k) ^ ^ for all k £ N. Since C is closed we conclude that v £ C. 
Since Ku + v = Y this shows that (u, v) solves ([8]) and thus J{u) = J{u). □ 

We proceed with the proof of Theorem 12. 4[ Again we present a generalized version. To 
this end, let D and E be as in Proposition IA.2i 

Proposition A. 3. There exists a constant C = C{X,u,v,p, E, D) such that 

< J{u[t]) - J{u) - {K*p, u[t] - u)u <Ct + 6klr] + Sklr] V(r > 0). 
Proof. Define = (4A~^ + l)D + E. Then it follows from ([32]) that 

\\Kuk - Ku\\ < \\p\\ + A"^ \\Ku\\ + B (37) 
\\pk\\<2\\p\\+X-^\\Ku\\+B, (38) 

where (n, v,p) is an arbitrary saddle point of Lx{u, v,p). Assume that r > and that k = /c[r] 
is such that 

max(||i^Mfc +Vk - Y\\ , \\Kuk-i - Kuk\\) < r. 
Then, it follows from dMI), dSS]), dSZD and ([M]) and that 

J (uk) < J{u) + {Kuk + Vk -Y,Ku- Kuk) + {pk-i,Kuk - Ku) + 5k 

< J{u) + r(||p|| + A-i \\Ku\\ + B) + Wpk^iW T + {pk-i,v - Vk) + 5k (39) 

< J{u) + r(3 IIpII + 2A~^ \\Ku\\ + 2B) + 2A~V H^fc - v\\ +5k + ek 

After observing that Kuk — Ku + Vk — v = Y \i follows that — {;|| < r + — Ku\\ and 

combining (p9]l and (|37|) gives 

J{uk) < J{u) + r (5 IIpII + 4A~^ \\Ku\\ + 45 + 2A~V) + 5k + Sk- 

Now, observe that from the definition of the subgradient and (j30p . it follows that J{uk) > 
J{u) + {K*p, Uk — u) and that (p, Vk — v) < 0. This and the fact that Kii + v = Y implies 
that 

< J{uk) - J{u) - {K*p,uk - u) 
= J (uk) - J{u) - (p, Kuk + Vk-Y) + {p,Ku + v -Y) + {p, Vk - v) 

< J{uk) - J{u) + IIpII r 

< T (6 IIpII + 4A"^ \\Ku\\ +AB + 2A~V) +5k + e^. 

This together with (j39p finally proves the first part of the assertion. □ 

Remark A. 4. For the case when Z) = = 0, it is seen from the proof of Proposition ()A.3P 
that the constant C takes the simple form 

„ A\\Ku\\+2t 



C = r 6 



A 



Proof of Corollary \2.5[ Assume that J(u) = ^ Then it follows (see e.g. [12, Lem. 

2.4]) that the subdifferential dJ{u) consists of the single element L*Lu. Hence the extremality 
relations (|30p imply that K*p = L*Lu. Now it is easy to observe that 



J{uk) - J{u) - {K*p,Uk -u) = ^ \\L{uk 



□ 
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